A Novel Approach to Creating Disambiguated Multilingual Dictionaries
نویسندگان
چکیده
Multilingual lexicons are needed in various applications, such as cross-lingual information retrieval, machine translation and some others. Often, these applications suffer from the ambiguity of dictionary items, especially when an intermediate natural language is involved in the process of the dictionary construction, since this language adds its ambiguity to the ambiguity of working languages. This paper aims at proposing a new method for producing multilingual dictionaries without the risk of introducing additional ambiguity. As a disambiguated intermediate language we use the so-called Universal Words. A set of more than 200,000 unambiguous Universal Words have been constructed automatically on the basis of the well-known English lexical database WordNet. This approach is being used for the construction of a five language-dictionary in the field of cultural heritage within the framework of the PATRILEX project sponsored by the Spanish Research Council.
منابع مشابه
Using Multilingual Topic Models for Improved Alignment in English-Hindi MT
Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine translation (MT). In absence of such dictionaries, a coarse dictionary may be required. This paper demonstrates the use of a multilingual topic model for creating coarse dictionaries for English-Hindi MT. We compare our approaches with: (a) a baseline with no additional dictionary injection, and...
متن کاملExtracting Multilingual Dictionaries for the Teaching
This paper describes a method for creating multilingual dictionaries using Wikipedia as a resource. A lucky strike on the road to multilingual information retrieval, the main idea is simple: taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages produces a multilingual dictionary in all those languages. While the page content...
متن کاملThe PAPILLON Project: Cooperatively Building A Multilingual Lexical Data-Base To Derive Open Source Dictionaries And Lexicons
The PAPILLON project aims at creating a cooperative, free, permanent, web-oriented and personalizable environment for the development and the consultation of a multilingual lexical database. The initial motivation is the lack of dictionaries, both for humans and machines, between French and many Asian languages. In particular, although there are large F-J paper usage dictionaries, they are usab...
متن کاملAutomatically Creating Multilingual Lexical Resources
The thesis proposes creating bilingual dictionaries and Wordnets for languages without many lexical resources using resources of resource-rich languages. Our work will have the advantage of creating lexical resources, reducing time and cost and at the same time improving the quality of resources created.
متن کاملBilingual emb e ddings with random walks over multilingual wordnets
Bilingual word embeddings represent words of two languages in the same space, and allow to transfer knowledge from one language to the other without machine translation. The main approach is to train monolingual embeddings first and then map them using bilingual dictionaries. In this work, we present a novel method to learn bilingual embeddings based on multilingual knowledge bases (KB) such as...
متن کامل